Body height estimation from automated length measurements on standing long leg radiographs using artificial intelligence

Artificial-intelligence (AI) allows large-scale analyses of long-leg-radiographs (LLRs). We used this technology to derive an update for the classical regression formulae by Trotter and Gleser, which are frequently used to infer stature based on long-bone measurements. We analyzed calibrated, standing LLRs from 4200 participants taken between 2015 and 2020. Automated landmark placement was conducted using the AI-algorithm LAMA™ and the measurements were used to determine femoral, tibial and total leg-length. Linear regression equations were subsequently derived for stature estimation. The estimated regression equations have a shallower slope and larger intercept in males and females (Femur-male: slope = 2.08, intercept = 77.49; Femur-female: slope = 1.9, intercept = 79.81) compared to the formulae previously derived by Trotter and Gleser 1952 (Femur-male: slope = 2.38, intercept = 61.41; Femur-female: slope = 2.47, intercept = 54.13) and Trotter and Gleser 1958 (Femur-male: slope = 2.32, intercept = 65.53). All long-bone measurements showed a high correlation (r ≥ 0.76) with stature. The linear equations we derived tended to overestimate stature in short persons and underestimate stature in tall persons. The differences in slopes and intercepts from those published by Trotter and Gleser (1952, 1958) may result from an ongoing secular increase in stature. Our study illustrates that AI-algorithms are a promising new tool enabling large-scale measurements.

www.nature.com/scientificreports/ may also lead to high inter-observer variability. A recently published AI-based algorithm automatizes length and angle measurements on LLRs, which enables using much larger datasets and produces standardized outputs 10-12 . In general, the application of AI technology in medicine facilitates the analysis of large imaging datasets such as radiographs, computer-tomography (CTs) or Magnetic Resonance Images (MRIs) in medicine. However, to date, no study has been published on the use of AI technology for height estimation based on radiographic measurements in a large human sample. The frequently used regression formulae derived by Gleser in 1952 and1958 are considered to be suitable for persons of European ancestry [13][14][15] and are applied to estimate stature from skeletal remains. Trotter and Gleser suggested not to estimate stature by determining the average of estimates obtained from several equations, each of which is based on a different bone or on a combination of bones 1 . Although the formulae derived by Trotter and Gleser 1,15 are considered to be quite reliable, these regressions were derived in the more than half a century ago, and average human stature has increased markedly since then, especially in high-income countries 16,17 . It is therefore a reasonable suggestion that these formulae may need to be adapted due to the secular change in stature 18 .
In this study, we aimed to derive updated regression formulae to infer stature for humans of European ancestry based on long bone measurements from living patients. For this purpose, we used measurements from LLRs taken between 2015 and 2020. We based the regressions on a large sample of more than 4000 adults and applied an AI-based algorithm to acquire tibial length, femoral length and total leg length for this patient sample. We then compared these newly derived regression formulae to existing ones in the literature.

Results
Patient demographics. Of the 4200 LLRs included in the final analysis, 2526 (60.1%) were from female patients and 1674 (39.9%) were from male patients. All included patients were between 18 and 95 years old and born between 1923 and 2002 with a median age of 66 years (Fig. 1). The mean BMI was 29.44 kg/m 2 (± 5.8 kg/ m 2 SD) and the mean height was 168.9 cm (± 9.6 cm SD). Summary statistics of patient demographic variables and total numbers of left, right and bilateral radiographs, which were used in this study, are presented in Table 1. Mean BMI was similar across age groups (Fig. 2). Scatterplots for femur length and stature, separately for males and females, are shown in Fig. 3.
Regression results. The linear regression equations for the estimation of stature in our sample based on either one or two bone lengths are presented in Table 2.
Correlations between stature and long bone lengths were consistently larger than 0.7 for all considered long bones in males and females. For the male sample, this correlation was r = 0.82 (95% CI 0.80-0.83) for the femur, r = 0.80 (0.79-0.82) for the tibia, r = 0.84 (0.82-0.86) for total leg length and r = 0.84 (0.83-0.86) for tibia + femur. The slopes and intercepts are the averages of left and right. The intercept for the femoral regression was 77.49 in our equation for males, compared to an intercept of 65.53 in Trotter and Gleser (1958), and an intercept of 61.41 in Trotter and Gleser (1952). The slope was 2.08 in our equation for males, compared to a slope of 2.32 in Trotter and Gleser (1958) and a slope of 2.38 in Trotter and Gleser (1952), see Fig. 3a.
The correlations between long bone length and stature for the female sample were r = 0.77 (95% CI 0.76-0.79) for the femur, r = 0.76 (0.75-0.78) for the tibia, r = 0.80 (0.78-0.81) for leg length and r = 0.80 (0.78-0.81) for tibia + femur. The intercept for the femoral regression in females was 79.81 in our sample, compared to an intercept of 54.13 in Trotter and Gleser (1952). The slope of the femoral regression for females was 1.90 in our equation, compared to a slope of 2.47 in Trotter and Gleser (1952), see Fig. 3b.
Average stature of the male subsample was 177 cm (± 7.4 cm SD), with a range from 148 to 202 cm. For females, average stature was 163 cm (± 6.5 cm SD), ranging from 140 to 189 cm. Detailed tables summarizing the stature distributions as well as the corresponding averages of the femur, tibia, leg and tibia + femur measurements for the male (Table 3) and female (Table 4) samples are included below.
We calculated differences between predicted stature and the mean of the clinically measured stature values for each stature category (each cm) independently, to assess the goodness of fit of the linear regression equations for the different stature categories. We found that the linear equations tended to slightly overestimate stature in short persons, and underestimate stature in tall persons, on average. For example, for male individuals who were two standard deviations (14.8 cm) shorter than the male mean (177 cm), stature was overestimated, on average, by 2.5-3.3% depending on the regression equation Female individuals who were two standard deviations (13 cm) shorter than the female mean (163 cm) were overestimated, on average, by 3.1-3.7% (5.6 cm based on the femur equation, 5.2 cm based on the leg length equation, 5.1 cm using the tibia + femur equation and 6.1 cm using the tibia equation). Females who were two standard deviations shorter than the female mean were underestimated, on average, by 2.6-3.0% (4.9 cm using the femur equation  www.nature.com/scientificreports/

Discussion
In this study, we derive new stature estimation regression formulae based on long bone measurements, which were collected from long leg radiographs of 4200 living Austrians. Measurement was automatized by the software LAMA™ 19 , which is an algorithm able to automatically place landmarks utilizing artificial intelligence. As expected, our findings confirm that different long bone lengths show a high correlation (r ≥ 0.76) with stature. Using tibia + femur or leg length resulted in a higher correlation with stature (r > 0.84) and hence also in a better predictive capacity of the regression formula compared to formulae using femoral or tibia length alone (r > 0.8).
Different stature estimation formulae have been described in the literature for different human populations and geographical areas, such as for Japanese 20 , Thai 21 , Portuguese 5 , Mexicans 22 , White US-Americans 15 and Native North Americans 23 . The formulae by Gleser (1952, 1958) are considered to be most suitable for persons of European ancestry 13,24 , but these formulae were established more than half a century ago. As the secular increase in stature has since led to an absolute increase in average stature in most human populations [25][26][27] , a review is warranted to assess whether these formulae require adjustment.
Our results show that the regression lines of the present study, which we derived based on a sample of more than 4000 living Austrians, possess a shallower slope and a larger intercept, compared to the formulae derived by Gleser (1952, 1958). We suggest that the differences in slopes and intercepts are a consequence of the ongoing secular increase in stature in Europe, where maturation occurs at increasingly younger ages, and absolutely larger adult height is reached. The exploitation of the full growth potential during childhood and adolescence is likely a consequence of reduced poverty, better nutrition and better general health 27 . This phenomenon shifts the population distribution of stature towards higher mean values. At the same time, human bodies, and especially most of our long bones, do not generally grow isometrically 18,[28][29][30] , which implies that the secular increase in stature likely affects the association between stature and the long bones 18,29,31 . In particular, the femur shows positive allometric growth 18 . Consequently, the secular increase in body size could be the reason for the larger intercept and shallower slope in the femoral regression formula derived in this study compared to the estimates by Gleser (1952, 1958). An alternative explanation could be that the observed differences in intercepts and slopes are a consequence of genetic differences between samples, or they could be due to non-random sampling in earlier work. Gleser (1952, 1958) used samples of military personnel, which might have been truncated, as those too short would not have been accepted into the military. Their female sample (Trotter, Gleser 1952) from the Terry Collection had uncommonly low stature by today 's standards.
This study aimed at updating the existing linear regression formulae for stature estimation. Our results indicate that a linear formula is limited in predicting stature accurately for very small and very tall persons. A further limitation of our study is that the exact measurement method and the used anatomical landmarks differ between radiographic measurements as collected here, which is the standard in radiology, and dry bone measurements, as collected in the studies by Gleser (1952, 1958) and as usually done in forensics. In the present study, length measurement methods described by Waldt et al. 32 were used as this is the standard in radiological long bone measurements 33,34 . We believe that despite the different measurement methods for long bone length in clinical medicine vs. forensics, these formulae have the potential to be applicable in anthropology and forensics. Dry bone length will likely deviate marginally from bone length measured on radiographs because bones shrink slightly when drying (ca. 2 mm difference in long bone length between fresh and dry bone 15 ). In addition, the position of the long bone on an osteometric board will differ marginally from the position of the femur of a person undergoing a radiograph. However, we expect the resulting measurement differences to be small. Future work could estimate the measurement error when assessing long bone length based on dry vs. wet bone vs. radiographs according to the clinical vs. forensic standard for the same person.
To conclude, we found that the regressions derived here have shallower slopes and an increased intercepts compared to formulae from the literature Gleser 1952, 1958). We interpret these differences as a possible consequence of the secular increase in stature. Our study illustrates that AI algorithms are a promising new tool enabling large-scale measurements of bones based on radiographs.

Methods
The study was approved by the institutional ethics review board (Ethics-Committee of the Vinzenz Group EK: 46/2020) and individual informed consent was waived. All data analysed were collected as part of routine diagnosis and treatment. All experiments were performed in accordance with relevant named guidelines and regulations.  www.nature.com/scientificreports/ We excluded patients with artificial joints, implants, other kinds of metalwork, posttraumatic or pathologic deformities, metabolic bone diseases, LLRs with no presence or visibility of the calibration ball, patients under 18 years of age, LLRs where the algorithm was unable to identify necessary landmarks and patients where stature was not recorded. In total, 4200 LLRs were measured and included in the final analyses.
Image acquisition. LLRs were taken as part of the clinical routine, as they are a standard procedure for preoperative planning and for diagnostic purposes. All LLRs were taken on the same device (DigitalDiagnost X-Ray-System 2.1.3, Philips Healthcare Inc., Andover, MA, USA) and each included a 25 mm calibration ball marker, which was placed medially or laterally of the knee joint. Table 4. Distribution of stature and means of long bone measurements for female (femur, tibia + femur, leg length, tibia length) for each stature level (each cm). N is the number of individuals at the respective stature level. All other measurements in cm. Vienna, Austria), which automates angle and length measurements on LLRs and annotates the original DICOM images, was used in this study. This program generates numerical outputs for the three linear distance measurements tibial length, femoral length and total leg length. LAMA™ automatically localizes anatomical features of the femur and tibia as well as the calibration ball to assess the landmarks needed to estimate the measurements. The software was designed to suppress the output if landmarks cannot be placed appropriately. Length calibration was performed by segmenting the calibration ball and calculating a magnification factor based on the size of the calibration ball and the diameter of the segmentation (pixel units). For all LLRs the following linear distance measurements were computed (Fig. 5) 32 . Leg length (measured as linear distance between top of the femoral head and midpoint of the tibial plafond), maximum femoral length (top of the femoral head-bottom of the femoral medial condyle), and tibial length (midpoint of proximal tibial joint line-midpoint of the tibial plafond).
Validation. The AI algorithm applied in this study was validated in a previous study on a smaller dataset of 289 LLRs and showed excellent intra-class-correlation between manually measured and automated measured lengths 19 . Comparison to existing formulae. The formulae derived in the present study were subsequently compared to existing formulae published by Trotter and Gleser in the 1950s Gleser 1952, 1958). Trotter and Gleser measured samples of US military personnel from the Korean War and from World War II. Stature measurements were recorded at the time of induction into military service. Long bone measurements were con- The left vertical axes depict the difference between predicted and mean stature value for the four regression formulae derived here (sigmoidal four parameter logistic curve): femur (grey), leg length (black), tibia + femur (black dotted) and tibia (grey dotted). For very short persons (left tail of the distributions), predicted stature was larger than the mean stature and for tall persons (right tail of the distribution) predicted stature was smaller than the mean stature for all four regression formulae. Dotted lines depict the mean as well as ± 2 standard deviations of the stature distributions.
Statistical analysis. Four ordinary least squares linear regression equations were estimated for stature as dependent variable and for femur, tibia, femur + tibia and total leg length, respectively, as predictor variable. Regressions were estimated separately for males and females. Correlation coefficients between stature and the three variables, leg length, femur length, and tibia length, respectively, were calculated, separately for males and females.
To assess the goodness of fit of the linear regression equations, we computed differences between the predicted value and the mean of the clinically estimated stature values for each stature category (for each cm). The resulting differences capture how well the linear model approximates the mean of the clinical measurements for www.nature.com/scientificreports/ each stature category. To plot the resulting differences, they were approximated by a logistic sigmoidal function (4 parameter logistic regression). P values < 0.05 were considered statistically significant throughout the study. All analyses were performed using IBM-SPSS® version 25 and GraphPad Prism® version 8.

Data availability
The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.